Distributional Term Representations for Short-Text Categorization

نویسندگان

  • Juan Manuel Cabrera
  • Hugo Jair Escalante
  • Manuel Montes-y-Gómez
چکیده

Everyday, millions of short-texts are generated for which effective tools for organization and retrieval are required. Because of the tiny length of these documents and of their extremely sparse representations, the direct application of standard text categorization methods is not effective. In this work we propose using distributional term representations (DTRs) for short-text categorization. DTRs represent terms by means of contextual information, given by document occurrence and term co-occurrence statistics. Therefore, they allow us to develop enriched document representations that help to overcome, to some extent, the small-length and high-sparsity issues. We report experimental results in three challenging collections, using a variety of classification methods. These results show that the use of DTRs is beneficial for improving the classification performance of classifiers in short-text categorization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Categorization

Text categorization is the task of assigning predefined categories to natural language text. With the widely used “bag-ofword” representation, previous researches usually assign a word with values that express whether this word appears in the document concerned or how frequently this word appears. Although these values are useful for text categorization, they have not fully expressed the abunda...

متن کامل

A signal processing approach to distributional clustering of terms in automatic text categorization

Distributional clustering has showed to be an effective and powerful approach to term extraction aimed at reducing the original term space dimensionality for Automatic Text Categorization [1]. In this paper we propose a new method for Distributional clustering based, not in information-theoretic methods as previous authors have done, but on a new Signal Processing interpretation that implies th...

متن کامل

Learning Grounded Meaning Representations with Autoencoders

In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level embeddings from textual and visual input. The two modalities are encoded as vectors of attributes and are obtained automatically from text and images, respectively. We evaluate our model on its ability to simulate sim...

متن کامل

Identifying Lexical Relationships and Entailments with Distributional Semantics

As the field of Natural Language Processing has developed, research has progressed on ambitious semantic tasks like Recognizing Textual Entailment (RTE). Systems that approach these tasks may perform sophisticated inference between sentences, but often depend heavily on lexical resources like WordNet to provide critical information about relationships and entailments between lexical items. Howe...

متن کامل

Cognitively Motivated Distributional Representations of Meaning

Although meaning is at the core of human cognition, state-of-the-art distributional semantic models (DSMs) are often agnostic to the findings in the area of semantic cognition. In this work, we present a novel type of DSMs motivated by the dual–processing cognitive perspective that is triggered by lexico–semantic activations in the short–term human memory. The proposed model is shown to perform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013